- Git in Rstudio
- Keystrokes for pipes: ctrl-shift-m = %>%
- RProjects and Knit directory
NAFC | Fisheries and Oceans Canada | August 16, 2018
"The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data." - John Tukey
EDA is sadly neglected in most stats courses
…
…
Package for reading flat files into R
cape <- read_csv("data/capelin_condition_maturation_v1.csv") # for csv
ggplot(data = abiotic, aes(x = temp_bottom, y = depth)) + geom_point()
filter(df3, year > 2012) %>% ggplot(aes(x = as.numeric(length))) + geom_dotplot() + facet_wrap(~year)
ggplot(data=df3) +
geom_boxplot(aes(x = as.factor(year), y = length))
## Step 1 Are there outliers in X and Y? #Cleveland dotplot
df3$id <- row.names(df3) filter(df3, year > 2012) %>% ggplot(aes(y = length, x = id)) + geom_point() + facet_wrap(~year)
filter(df3, sex != 3) %>% ggplot(aes(x = as.factor(year), y = length)) + geom_boxplot() + facet_grid(rows = vars(sex))
ggplot(data = df3, aes(x = length)) + geom_histogram() + facet_wrap(~sex)
- use QQ plots after running the model
p <- ggplot(df3, aes(x=weight)) p + geom_histogram()
filter(df3, weight == 0)
scatterplotMatrix(~ ln_biomass_med + tice + meanCond_lag + surface_tows_lag2 + ps_sdTot_lag2 , reg.line=lm, smooth=TRUE, spread=FALSE, span=0.5, diagonal = 'density', data=cape)
pairs.panels(cape[c("ln_biomass_med", "tice", "meanCond_lag", "surface_tows_lag2", "ps_sdTot_lag2")],
method = "pearson", # correlation method
hist.col = "#00AFBB", density = F, # show density plots
ellipses = F, # show correlation ellipses,
cex.labels = 1, cex.cor = 1)
ggplot(data=cape) + geom_point(aes(x=tice, y = ln_biomass_med))
ggplot(data=df3) + geom_point(aes(x=log10(length), y = log10(weight), colour=nafo_div)) + facet_wrap(~nafo_div)
Think hard about this - they can seriously complicate the analysis!!!!!
p <- ggplot(data=cape, aes(x = tice, y = ln_biomass_med)) + geom_point() p <- p + geom_smooth(method = lm, se = F) + facet_wrap(~ cut_number(meanCond_lag, 3)) p
"Testing for independece is not always easy" - Zuur et al. 2010
Make it reproducible (i.e., make a *.Rmd file)
Your turn: do this with the trawl_abiotic data
Cleveland. 1994. The Elements of Graphing Data. Summing (NJ): Hobart Press Ieno and Zuur. 2015. A Beginner's Guide to Data Exploration and Visualization with R. Highland Statistics Ltd. http://highstat.com/index.php/beginner-s-guide-to-data-exploration-and-visualisation
Tukey. 1977. Exploratory Data Analysis. Reading (MA): Addison-Wesley Yeager et al. 2007. Graphical methods for exploratory analysis of complex data sets. BioScience 57: 673-679. Zuur et la. 2010. A protocol for data exploration to avoid common statistical problems. MEE 2010: 3-14